Fix multipart direct upload buffering chunks in memory by JR-1991 · Pull Request #52 · gdcc/python-dvuploader

JR-1991 · 2026-02-02T16:53:23Z

This pull request refactors the chunked file upload process to improve efficiency and flexibility, especially when handling large files. The main changes involve switching from loading entire chunks into memory to streaming file data directly from disk, supporting both in-memory and asynchronous file readers, and making chunk size handling more robust.

Use aiofiles' AsyncBufferedReader to stream file data instead of loading each chunk into memory. Add DEFAULT_CHUNK_SIZE and typing Union/AsyncBufferedReader imports. _chunked_upload now reads and uploads slices directly from the file, passing chunk_size to _upload_chunk. _upload_chunk accepts either BytesIO or AsyncBufferedReader and uses chunk_size to set Content-length for streamed reads. upload_bytes now supports AsyncBufferedReader and enforces chunk_size limits while updating the hash function. These changes reduce memory usage and ensure correct chunk sizing for multipart uploads.

JR-1991 self-assigned this Feb 2, 2026

JR-1991 added the bug Something isn't working label Feb 2, 2026

JR-1991 added this to PyDataverse Working Group Feb 2, 2026

JR-1991 merged commit b9fc079 into main Feb 2, 2026
12 checks passed

github-project-automation bot moved this to Done in PyDataverse Working Group Feb 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multipart direct upload buffering chunks in memory#52

Fix multipart direct upload buffering chunks in memory#52
JR-1991 merged 1 commit intomainfrom
fix-chunk-read

JR-1991 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JR-1991 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant